Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify chunked octree traversals #84

Merged
merged 22 commits into from
Nov 27, 2024

Conversation

victorreijgwart
Copy link
Member

@victorreijgwart victorreijgwart commented Nov 21, 2024

Description

Release 2.0.0 introduced experimental NodePtr and NodeRef classes, enabling traversal of chunked octrees as if they were regular octrees. This PR finalizes their implementation, uses them to simplify map operations and measurement integrators, and extends the QueryAccelerator to support chunked octrees. Benchmarks confirm these changes improve code clarity while slightly enhancing performance.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that causes existing functionality to not work as expected)
  • Other (please describe):

Detailed Summary

Prior to this PR, handling chunked octrees involved explicit reasoning about chunks, relative node positions, and parent-child chunk boundaries. This approach made the code harder to read, maintain, and debug.

Release 2.0.0 introduced NodePtr and NodeRef, inspired by STL’s std::vector<bool>::bitref, to emulate the semantics of standard octree nodes for chunked octrees. However, their implementation was experimental, and the performance impact compared to directly interacting with chunks was uncertain.

This PR:

  1. Refines the semantics and implementation of NodePtr and NodeRef.
  2. Rewrites chunked octree map operations and measurement integrators using these classes.
  3. Extends the QueryAccelerator to support HashedChunkedWaveletOctrees.

Extensive benchmarks show that these changes not only simplify the codebase but also slightly improve performance.

In addition, several older code sections were cleaned up. While most changes are internal, notable updates affecting users are documented below.

API Changes

List any changes to wavemap's APIs to help users update their code. Write "None" if there are no changes.

C++ API:

  • Additions

    • The QueryAccelerator now also supports HashedChunkedWaveletOctrees
    • It is now possible to erase individual nodes from chunked octrees using ChunkedNdtree::NodeRef
  • Refactoring

    • Some type aliases (using ... = ...) nested in the Map classes have been moved or renamed

Python API:

  • Additions

    • Maps of type HashedChunkedWaveletOctree now also include fast batched accessors

ROS1 Interface:

  • Refactoring

    • The TfTransformer now returns the transform by value, and indicates lookup failures by returning std::nullopt

Review Notes

@LionelOtt A general review would be great. Let me know if some parts of the code are difficult to follow or if the purpose of certain methods or classes is unclear. Then I'll add more comments and docstrings.

Testing

Automated Tests

The NodePtr and NodeRef classes are thoroughly tested via the ndtree data structure, map, and measurement integrator test suites. Additionally, the QueryAccelerator test suite has been extended to cover QueryAccelerator<HashedChunkedWaveletOctree>.

Manual Tests

The changes were manually tested by rerunning the Demos on an AMD laptop and an Intel desktop.

Benchmarks (To be completed by maintainers)

Newer College 20cm

Model Version RAM (MB) Map size (MB) CPU time (s) Wall time (s) AUC
beam v2.1.2 153.31 13.80 92.30 41.38 0.91
beam proposed 148.79 13.80 90.23 39.80 0.91
ray v2.1.2 148.70 13.17 69.18 35.52 0.91
ray proposed 151.21 13.07 67.86 34.75 0.91

Newer College 5cm

Model Version RAM (MB) Map size (MB) CPU time (s) Wall time (s) AUC
beam v2.1.2 871.07 576.56 2384.99 194.63 0.97
beam proposed 854.34 576.62 2369.48 191.39 0.97
ray v2.1.2 826.55 409.39 1152.18 113.68 0.96
ray proposed 818.27 412.49 1138.29 110.81 0.96

Pantopic Flat 5cm

Model Version RAM (MB) Map size (MB) CPU time (s) Wall time (s) AUC
beam v2.1.2 122.03 6.44 6.42 3.59 0.99
beam proposed 121.40 6.43 6.35 3.57 0.99
ray v2.1.2 124.53 7.10 4.32 3.02 0.99
ray proposed 123.55 7.18 4.14 2.90 0.99

Pantopic Flat 2cm

Model Version RAM (MB) Map size (MB) CPU time (s) Wall time (s) AUC
beam v2.1.2 269.81 58.44 53.78 7.79 1.00
beam proposed 269.70 58.33 53.79 7.64 1.00
ray v2.1.2 261.01 61.94 25.02 4.92 1.00
ray proposed 260.38 61.85 24.65 4.86 1.00

Note that we leave out the plots comparing the accuracy-over-distance of proposed and v2.1.2, as they fully overlap. The plots for v2.1.2 can be seen in PR #83.

Checklist

General

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated tests as required
  • Any required changes in dependencies have been committed and pushed

Documentation (where applicable)

  • I have updated the installation instructions (in docs/pages/installation)
  • I have updated the code's inline API documentation (e.g., docstrings)
  • I have updated the parameter documentation (in docs/pages/parameters)
  • I have updated/extended the tutorials (in docs/pages/tutorials)

The initial implementation did not properly distinguish pointers-to-const (i.e. `const T*`) from const pointers (i.e. `T* const`), resulting in complicated code and incorrect handling of some edge cases.
The new version's code is cleaner and shorter, and more closely aligns with standard C++ syntax:
* pointer-to-const: `ChunkedNdtreeNodePtr<const YourChunkType>`
* const pointer: `const ChunkedNdtreeNodePtr<YourChunkType>`
* const pointer-to-const: `const ChunkedNdtreeNodePtr<const YourChunkType>`
Instead of first calling hasChild and then retrieving a reference with *getChild, this change directly uses the pointer returned by getChild, avoiding a double lookup and improving efficiency. With C++17's if condition init-statements, the code remains clean and easy to read.
@victorreijgwart victorreijgwart marked this pull request as draft November 21, 2024 14:34
@victorreijgwart victorreijgwart self-assigned this Nov 21, 2024
@victorreijgwart victorreijgwart added the enhancement New feature or request label Nov 21, 2024
@victorreijgwart victorreijgwart force-pushed the feature/simplify_chunked_octree_traversals branch from 7c9a2fc to 3de0323 Compare November 21, 2024 18:15
const auto num_queries = index_view.shape(0);
// Create the raw results array and wrap it in a Python capsule that
// deallocates it when all references to it expire
auto* results = new float[num_queries];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that's nanobind that wants to have raw memory fun rather than raii things?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was also a bit surprised. Their docs seem to suggest this approach, and I couldn't find an RAII-based alternative.

@LionelOtt
Copy link
Contributor

I read through it, seems like a lot of clean-up and unification and typing changes but overall it's not too complicated to follow in my view.

@victorreijgwart victorreijgwart marked this pull request as ready for review November 22, 2024 12:06
@victorreijgwart victorreijgwart force-pushed the feature/simplify_chunked_octree_traversals branch from 5be9501 to 75f4edf Compare November 22, 2024 16:03
Copy link
Contributor

@LionelOtt LionelOtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

This reverts commit 75f4edf, as we believe its changes are not yet ready for release. Tracy is configured through CMake options, part of which are converted to C++ compile definitions. Since they affect both Tracy's binary and headers, the same options/definitions must be set when compiling Tracy's system library, wavemap C++ library (CMake), and its ROS1 interface (catkin). A complicating factor is that catkin ignores standard CMake features like inheriting a linked library's PUBLIC definitions. We will reintroduce Tracy-support once we find a cleaner, less error-prone way to integrate it into the project.
@victorreijgwart
Copy link
Member Author

Thanks @LionelOtt. I'll update the Changelogs and document the API changes, and then we can merge it.

@victorreijgwart victorreijgwart merged commit 557b37a into main Nov 27, 2024
25 checks passed
@victorreijgwart victorreijgwart deleted the feature/simplify_chunked_octree_traversals branch November 27, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants